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ABSTRACT 

We present evidence that multigrid works for wave equations in disordered systems, e.g. in the 
presence of gauge fields, no matter how strong the disorder, but one needs to introduce a "neural 
computations" point of view into large scale simulations: First, the system must learn how to do 
the simulations efficiently, then do the simulation (fast). The method can also be used to provide 
smooth interpolation kernels which are needed in multigrid Monte Carlo updates. 
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There is a stochastic multigrid method and a deterministic one. The stochastic version is 
used to compute high dimensional integrals in Euclidean quantum field theory or statistical 
mechanics by a Monte Carlo method which uses updates at different length scales []T], |^. The 
deterministic version 0, ^ solves discretized partial differential equations. One hopes to use 
both of them in simulations of lattice QCD, for updating the gauge fields and for computing 
fermion propagators in given gauge fields. In either case the aim is to beat critical slowing 
down in nearly critical systems, i.e. to maintain fast convergence when long range correlations 
appear. 

A crucial problem is how to define and exhibit smooth functions in the disordered context, 
i.e. when translation symmetry is strongly violated. We will present a method how to solve 
this problem. 

We recommend to think of more general disordered systems than gauge theories. This puts 
the core of the problem into sharper focus, and it opens the way to other possible applications 
such as low lying states of spin glasses, the shape of a lightning, waves on fractal lattices (with 
bond percolation), localization of low lying electronic states in amorphous materials, etc. 
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1. The Multigrid 

One starts from a problem on a given "fundamental" lattice A'' of lattice spacing uq. One 
introduces a sequence of lattices A^, A^, A^ of increasing lattice spacings aj = Lla^, together 
with interpolation operators and restriction operators which map functions on coarser 
lattices into functions on finer lattices, and vice versa. Let W be the space of functions on 
lattice A-'. Then we need 

A^ : 'H-' ^ Ti"'^^ : interpolation operator (1-1) 
: 'H-'^^ ^ W : restriction operator (1-2) 

Given the interpolation operators, one can use them to define restriction operators. The choice 
= A^* is made in "variational coarsening" |0] . 

Typically, we choose = 2, and a last layer A^ which consists of a single point. 

2. The Basic Importance of Smoothness 
2.1. Deterministic multigrid 

One wants to solve a discretized elliptic differential equation on A" 

D,e = f. (2.1) 

It might have arisen from an eigenvalue equation Dq^^ = e^^ by inverse iteration [^]. If Dq has 
a small eigenvalue, then local relaxation algorithms suffer from critical slowing down. 

Basic observation (in the "ordered case" 0, Q)- After some (damped) relaxation sweeps 
on A*^ one gets an approximate solution whose error e*^ = — is not necessarily small but 
is smooth (on length scale ao). The unknown error e° satisfies the equation 

Doe° = r°. (2.2) 

It involves the residual r^ = p — Dq^^ which would be zero for an exact solution. 

Given that e" is smooth, it can be obtained by smooth interpolation of a suitable function 
on A^, 

e° = A^e' . (2.3) 

That is, e° = J^xga^ ^L^x with Al^ which depends smoothly on z. 

Now define a restriction operator such that interpolation followed by restriction amounts 
to doing nothing, i.e. C^A^ = 1. (For instance = {A^*A^)-^A^*). Then (2.3) can be 
inverted, = C^e° . Applying to both sides of (2.2) yields an equation for e\ 

Die^ = r^ , (2.4) 

with = C^r° (restricted residual) and Di = C^DqA^ (effective differential operator). Given 
e^, one obtains from (2.3), and C,^ + e^ is an improved solution of (2.1). Thxis,the problem has 
been reduced to an equation on the lattice which has fewer points. If necessary, one repeats 
the procedure, moving to A^ etc. The procedure stops, because an equation on a "lattice" A^ 
with only a single point is easy to solve. 

The iterated interpolation A^'^^^ = A^A^.-.A^ from A-' to A° should yield functions on A° 
which are smooth on length scale aj, i.e. which change little over a distance aj (in the ordered 
case). For reasons of practicality, one must require that 

Ai^ = unless z is near x . (2.5) 

Example: The optimal choice for the 1-dimensional Laplace equation is ^ 

Axx = '^ ,Ax±i,x = ^ ,others = 0. (2.6) 
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We chose to regard as a sublattice of A" and set Oq = 1. 

This interpolation has the property that it maps constant functions on A^ into constant 
functions on A°. Constant functions are the lowest eigenmodes of the Laplacian. Di comes out 
proportional to a Laplacian again. 

2.2. Stochastic multigrid 

The successful stochastic multigrid updating method 0, ^ for O(A^), CP^ and SU{N) x 
SU{N) spin models was described in Wolff's lecture p. One uses updates of spins s{z) at sites 
z E of the form 

s{z) = e'^^^'^^ s{z) (2.7) 

where the matrix A is a generator of a group of transformations which can act on spins s, and 
where A^^l vanishes outside a neighborhood of diameter of order aj of x. One may assume a x 
to take values in a lattice A-'. It is important, though, that the supports of Af^^ in z should 
overlap for adjacent x. 

This procedure eliminates critical slowing down in the spin models almost completely, pro- 
vided on chooses to be smooth in z on length scale j. 

For general models, a sufficiently high acceptance rate for nonlocal updates like (2.7) is 
necessary to eliminate critical slowing down. Finn and Grabenstein show evidence that 
smoothness alone is not in general enough to ensure this. Their criterion demands also non- 
appearance of mass terms. In the spin models this is true. Fure gauge theories have no bare 
mass parameter. But an effective mass is present in the following "weak coupling" multigrid 
updating scheme for gauge fields in 4 dimensions, with gauge group G, cp. [§. Suppose one 
updates only gauge fields attached to links which point in one selected direction - call it the 
vertical direction. One can regard these variables as spins, and perform updatings like (2.7), 
with A that are smooth in an appropriate sense - cp. later. Only variables residing in the same 
horizontal 3-dimensional sublattice are actually coupled, therefore one effectively does updat- 
ings in 3-dimensional Higgs-models, with action —l3s{z)A-^s{z). The covariant Laplacian A-*- 
depends on a 3-dimensional G x G-gauge field that is determined by the gauge field variables of 
the model which are attached to horizontal links. The lowest eigenvalue of A-*- has dimension 
mass squared, and is strictly positive for generic gauge fields. It is determined by variables that 
are not updated and is therefore like a parameter. 

Experience with 0^-theory suggests that such a "weak coupling" approach might neverthe- 
less lead to a substantial acceleration in practise. This is under investigation. 

3. Smoothness in Disordered Systems 

From section 2 we learn that a successful multigrid scheme, whether deterministic or stochas- 
tic, needs smooth interpolation kernels A. This raises the basic 

Question: What is a smooth function in the disordered situation, for instance in an external 
gauge field? 

Naive smoothness of a function ^ means that Z]/^(V^^, V^(^) ^ iCiO^ where are dis- 
cretized ordinary derivatives. But this is not gauge covariant. A tentative remedy would be to 
take the covariant derivative V^. But 

E(Vm^, v^o = (e, -AO > eo(e, • (3.1) 

The lowest eigenvalue Eq of the negative covariant Laplacian —A is a measure for the disorder 
of the gauge field. (It is positive and vanishes only for pure gauges.) By definition, it is not 
small for disordered gauge fields. Therefore there are no smooth functions in this case. 

°In most simulations, it was actually chosen at random. 
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Nevertheless there is an answer to the question, assuming a fundamental differential operator 
Do is specified by the problem. In the stochastic case, the Hamiltonian often provides Dq. 
Answer: A function ^ on A° is smooth on length scale a when 



<^ Uf in units a = l . 



(3.2) 



We found that a deterministic multigrid which employs interpolation kernels A'^^^^ from A-' 
to the fundamental lattice A*^ which are smooth in this sense, works for arbitrarily disordered 
gauge fields - see later. 

When there are no smooth functions in this sense at length scale uq, then Dq has no low 
eigenvalue, and there is no critical slowing down and no need for a multigrid. 



The above answer appears natural, and the "projective multigrid" of |]T0|, |TT| is in its spirit. 
But there are subtleties, and there is the question of how to obtain kernels Af^^ which are 
smooth on length scale aj. 

Problem: One needs approximate solutions of eigenvalue equations 



DoA^ = £o(a:)4°. 



(3.3) 



X is fixed, and Dq acts on z. Since A'^^^^ is required to vanish for z outside a neighbourhood of 
X, the problem involves Dirichlet boundary conditions. 

For large j, A^^^^ will have a large support. If there is no degeneracy in the lowest eigenvalue, 
one can use inverse iteration combined with standard relaxation algorithms for the resulting 
inhomogeneous equationcl. But this and other standard methods will suffer from critical slowing 
down again. 

Moreover, in the standard multigrid setup, one uses basic interpolation kernels A^ which 
interpolate from one grid A-^ to the next finer one. In this case 



A[Oj] - A^A-i Aj 

and (3.3) becomes a very complicated set of nonlinear conditions. 
Possible solutions are 



(3.4) 



(i) Replace (3.3) by minimality of a cost functional (cp. later). Use neural algorithms to find 

kernels A^ which minimize it. This is still under study. 

(ii) Give up factorization (3.4) and determine independent kernels A^^^^ as solutions of (3.3) by 

multigrid iteration. This is done successively for j = 1,2,... One uses already determined 
kernels A'^^'^^ with k < j for updating A^^^^. We found that this works very well - cp. later. 

Method (ii) is not quite as efficient as standard multigrid methods (MG) for ordered systems 
because of the overhead for storing and computing the kernels. Assuming convergence as 
expected, algorithms compare as follows for a lattice A° of L'^ sites in d dimensions. (The 
overhead is included): 



Algorithm 


work (flops) 


storage space 


local relaxation 




L'^+%z ^ 2) 




MG ["ordered"] 








MG ["disordered". 


(i)] 


??? 




MG ["disordered". 


(ii)] 


L'^ In^ L 


L'^ InL 



''Basically one computes an approximation to Dq ^J^start- 



4. Criteria for Optimality 

We consider iterative solution of a discretized partial differential equation (2.1). Any itera- 
tion amounts to updating steps of the form 

^'^i" = gi' + a f . (4.1) 

g is called the iteration matrix, and cr = (1 — q)Dq^ . The convergence is governed by the norm 
II ^11 of the iteration matrix. The iteration converges if ||^|| < 1, and its relaxation time is 

In II ^11 

Parameters in the algorithm - such as interpolation kernels A{^, restriction operators C^^ and 
effective differential operators Dj - are optimal if the cost functional -E = || ^|p is at its minimum. 

Example: Consider a twogrid iteration in which a standard relaxation sweep on A° with 
iteration matrix qq is followed by exact solution of the coarse grid equation (2.4). The second 
step leads to an updating with some iteration matrix ^i, and q = QoQi- Therefore one may 
estimate E < \\Dq qqW^ Ei with Ei = \\Dq^ QiW^- This form of the estimate 0] is motivated by 
the fact that the fine grid relaxation smoothens the error, but does not converge fast. Therefore 
1 1 Do ^'oll is suppressed, whereas ||^o|| is not much smaller than 1. 

Only El depends on the interpolation kernels etc. Therefore one can try to optimize these 
parameters by minimizing Ei. 

Using the trace norm, ||^|p = tr gg*, one finds 

Ei = Volume"^ J2 l^.^'l^ ^i^^ T = D^^ - D^^ . (4.3) 



Prescribing C^, and determining D\ and by minimizing E\ yields what we call the "ideal 
interpolation kernel" A} for a given restriction map . Kalkreuter did twogrid iterations 
for (2.1) in 4 dimensions with Dq as shown in section 6 below, using ideal interpolation ker- 
nels [|1^. He found absence of critical slowing down for arbitrarily disordered S"?/ (2) -gauge 
fields. This showed for the first time that multigrid could work in principle for arbitrarily 
strong disorder. The ideal kernel A}^^ is impractical for production runs, though. This is 
because it has exponential tails instead of vanishing for z outside a neighbourhood of x. 

In the analytic multigrid approach to Euclidean quantum field theory |]1|, F is known as the 
fluctuation field or high frequency field propagator. It has an infrared cutoff which makes it 
decay exponentially with distance |z — ti'l. Typically, the stronger the decay, the smaller E^- 

5. Neural Multigrid 

A feed-forward neural network can perform the computations to solve (2.2) by multigrid 



relaxation. The nodes of the network ("neurons") are identified with points of the multigrid. 
There are two copies of the multigrid, except that the last layer A^ is not duplicated. In the 
standard multigrid approach, the basic interpolation kernels A^ interpolate from one layer A-' 
of the multigrid to the preceding one, A-^~^. In this case the network looks like in fig. 5.1. Each 
node is connected to some of the nodes in the preceding layer in the neural network. In the 
upper half, the connection strength from x G A-' to z G A-'^^ is A\^. In the lower half, node 
z G A-'"^ is connected to x G A-' with strength B?^^. In addition there is a connection of strength 
ujjdjl {dj^x = {E)j)xx, i^j = damping parameter on A-') between the two nodes which represent 
the same point z in A-', j < N. These connections model the synapses in a brain. According 
to Hebb's hypothesis of synaptical learning, the brain learns by adjusting the strength of its 
synaptical connections. 
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Figure 5.1: A feed-forward neural multigrid architecture (selected connections). 



The network receives as input an approximate solution ^ of (2.1), from which the residual 
J.0 — jO _ jg then determined. It computes as output an improved solution = ^ + 6^. 
The desired output ("target") is C = 

is a linear function of r . 

Except on the bottom layer, each neuron receives as input a weighted sum of the output 
of those neurons below it in the diagram to which it is connected. The weights are given by 
the connection strengths. Our neurons are linear because our problem is linear. The output of 
each neuron is a linear function of the input. (One may take output = input). 

The result of the computation is 

5^ = (a;o d^' + E ^fc 4"' R^''^) (5.1) 
fc>i 

where R^''^^ = R^...R^E^ and A^^'^ = A^A'^...A''. 

In principle, R^ is determined by the restriction operators and by the effective differential 
operator, R^ = C^{1 — Uj^iDj_id~}i). But actually, the restriction kernels , effective differ- 
ential operators Dj {j > 0) and their diagonal part dj, and the damping parameters Uj {j > 0) 
enter only in the combination R^ (assuming ujjd~]. can be scaled to 1). They are therefore not 
needed separately. 

The fundamental differential operator Dq and its diagonal part d^ are furnished as part of 
the problem. The connection strengths ("synaptical strengths") A{^, Ri^ (and possibly the 
damping factor uq) need to be found by a learning process in such a way that the actual output 
is as close as possible to the desired output. 



In supervised learning of a neural network |jT3|, a sequence of pairs {^^X'^) is presented 
to the network. Given input ^f^, the actual output is compared to the target C^, and the 
connection strengths are adjusted in such a way that the cost functional 

E = j2\\o^ -c^f 

gets minimized. If an iterative procedure to achieve this minimization is specified, one calls 
this a learning rule. 

Because of linearity, it suffices to consider the equation Dq^ = p in the limit of small /°. 
In the limit i— > 0, the target = for any input, and = Q (4-1), with q = iteration 

matrix. 
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Taking for the sequence a complete orthonormal system of functions on A°, 

E = minimum is the previous optimahty condition of section 4 for multigrid relaxation. 

Conclusion: Optimizing the kernels A and R is a standard learning problem for a feed 
forward neural net. 

6. Learning Rule Performance 

The second variant, for which a learning rule was already described in section 3, involves 
a slightly different neural net. Instead of the connections between neighbouring layers of the 
multigrid, we have now connections from A*^ to A'^ with connection strength Cl^^\ and from A-' 
to A'' with connection strength A^^^^. We adopt variational coarsening, 

connection strengths are determined by interpolation kernels A^^''^ which have to be learned. 
The damping factors cuk were set to 1, and dk is the diagonal part of Dk as before, with 

= ^[0fcl*i)j3^m . (6.1) 

The learning rule of section 3 requires a process of "hard thinking" by the neural net. 
Neurons which have learned their lesson already - i.e. which have their synaptical strengths 
fixed - are used to instruct the rest of the neural net, adjusting the synaptical strengths of the 
next multigrid layer of neurons. 

A checkerboard variant of this algorithm was tested in 2 dimensions, using SU(2)-gauge 
fields which were equilibrated with standard Wilson action at various values of P, and Dq = 
—A — So + Srn^. Eq is the lowest eigenvalue of the covariant Laplacian —A, and Sm^ > 0. 
Conventional relaxation algorithms for solving (2.1) suffer from critical slowing down for such 
Dq, for any volume and small dm^. 

It turned out that it was not necessary to find accurate solutions of the eigenvalue equation 
for the interpolation kernels A^'^^^. An approximation A^^^^ to (— A)""^^^, was computed. A total 
of four relaxation sweeps through each multigrid layer below j {n = 2 inverse iteration steps 
a one V-cycle each) was enough. The convergence rate of the following (^-iteration is shown in 
fig. 6.1 for P — 1.0. The correlation time r is in units of MG-iterations. One MG-iteration 
involved one sweep through each multigrid layer, starting with j — 0. Updating ^ at x e A-' 
changes ^ by 

Sweeps were actually performed in checkerboard fashion. 
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Figure 6.1: Correlation time r as function of the lowest eigenvalue Srri^ in a representative gauge 
field configuration equilibrated at /? = 1.0. For the 64^ lattice, the correlation time fluctutates 
very little with the gauge field configuration. 
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